skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Thebaud, Thomas"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The Interspeech 2025 speech emotion recognition in natural istic conditions challenge builds on previous efforts to advance speech emotion recognition (SER) in real-world scenarios. The focus is on recognizing emotions from spontaneous speech, moving beyond controlled datasets. It provides a framework for speaker-independent training, development, and evaluation, with annotations for both categorical and dimensional tasks. The challenge attracted 93 research teams, whose models significantly improved state-of-the-art results over competitive baselines. This paper summarizes the challenge, focusing on the key outcomes. We analyze top-performing methods, emerging trends, and innovative directions. We highlight the effectiveness of combining foundational models based on audio and text to achieve robust SER systems. The competition website, with leaderboards, baseline code, and instructions, is available at: https://lab-msp.com/MSP-Podcast_Competition/IS2025/. 
    more » « less
    Free, publicly-accessible full text available August 17, 2026
  2. Ensuring that technological advancements benefit all groups of people equally is crucial. The first step towards fairness is identifying existing inequalities. The naive comparison of group error rates may lead to wrong conclusions. We introduce a new method to determine whether a speaker verification system is fair toward several population subgroups. We propose to model miss and false alarm probabilities as a function of multiple factors, including the population group effects, e.g., male and female, and a series of confounding variables, e.g., speaker effects, language, nationality, etc. This model can estimate error rates related to a group effect without the influence of confounding effects. We experiment with a synthetic dataset where we control group and confounding effects. Our metric achieves significantly lower false positive and false negative rates w.r.t. baseline. We also experiment with VoxCeleb and NIST SRE21 datasets on different ASV systems and present our conclusions. 
    more » « less